Learning, Using Examples, to Translate Phrases and Sentences to Meanings

نویسندگان

  • Adam Davis Kraft
  • Arthur C. Smith
چکیده

Before we can create intelligent systems that exhibit the versatility of the human intellect, we must understand how our command of language enables the uniquely broad scope of our reasoning ability. To pursue such an understanding, we must discover the ways by which language gives rise to representations which, in turn, serve as the building blocks of models that capture constraints and regularities of our environment. The work described in this thesis constitutes a step toward this goal. I have combined aspects of Winston's ArchLearning methodology with implementations of three powerful representations: Lexical Conceptual Semantics[Jackendoff 1983], Transition Spaces[Borchardt 1993], and Thread Memory[Vaina, Greenblatt 1979], in a system that learns to instantiate semantic descriptions from language based on a sequence of examples. My program, Lance, builds models of the correspondences between parse trees and semantic descriptions by generalizing from a sequence of pairs of sentence fragments and descriptions. Additionally, counterexamples of one type of correspondence model may be generated from examples of similar models in order to facilitate learning by near miss. The result is that my system can learn such constraints as in order for a sentence to convey a transition, it must contain a verb that means either “change,” “appear,” or “disappear.” In this work I developed an approach based on presentation of parse trees paired with instantiated representations and the Arch-Learning paradigm, and implemented Lance, a 12,000 line Java program. I demonstrated that from a training sequence of 95 examples, Lance learned 27 models of THINGS PARTS, PLACES, PATHELEMENTS, TRAJECTORY-SPACES, TRANSITION-SPACES, CAUSES, and IS-A relations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Inverse λ and Generalization to Translate English to Formal Languages

We present a system to translate natural language sentences to formulas in a formal or a knowledge representation language. Our system uses two inverse λ-calculus operators and using them can take as input the semantic representation of some words, phrases and sentences and from that derive the semantic representation of other words and phrases. Our inverse λ operator works on many formal langu...

متن کامل

Using Inverse lambda and Generalization to Translate English to Formal Languages

We present a system to translate natural language sentences to formulas in a formal or a knowledge representation language. Our system uses two inverse λ-calculus operators and using them can take as input the semantic representation of some words, phrases and sentences and from that derive the semantic representation of other words and phrases. Our inverse λ operator works on many formal langu...

متن کامل

Automatic Idiom Identification in Wiktionary

Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...

متن کامل

Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007